# A tibble: 13 × 2
year cases
<int> <int>
1 1981 12
2 1982 14
3 1983 33
4 1984 50
5 1985 67
6 1986 74
7 1987 123
8 1988 141
9 1989 165
10 1990 204
11 1991 253
12 1992 246
13 1993 240
Lecture 05
Family: poisson
Links: mu = log
Formula: cases ~ year
Data: aids (Number of observations: 13)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -397.20 15.76 -428.45 -366.18 1.00 1585 2153
year 0.20 0.01 0.19 0.22 1.00 1588 2153
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
predicted_draws_fix(g_bayes, newdata = aids) |>
group_by(.chain, .row) |>
summarize(
rmse = yardstick::rmse_vec(cases, .prediction),
crps = dukestm::calc_crps(.prediction, cases)
) |>
group_by(.chain) |>
summarize(
rmse = mean(rmse),
crps = mean(crps)
)# A tibble: 4 × 3
.chain rmse crps
<int> <dbl> <dbl>
1 1 26.4 17.8
2 2 26.3 17.7
3 3 26.3 17.7
4 4 26.4 17.6
::: {.medium}
predicted_draws_fix(g_bayes, newdata = aids) |>
group_by(.row, cases) |>
ggdist::mean_hdi(
.prediction, .width = c(0.5,0.9,0.95)
) |>
mutate(contains = cases >= .lower & cases <= .upper) %>%
group_by(.width) |>
summarize(
emp_cov = sum(contains)/n()
)# A tibble: 3 × 2
.width emp_cov
<dbl> <dbl>
1 0.5 0.231
2 0.9 0.385
3 0.95 0.462
( g2_bayes = brms::brm(
cases~year+I(year^2), data=aids, family=poisson,
refresh=0, backend = "cmdstanr"
) ) Family: poisson
Links: mu = log
Formula: cases ~ year + I(year^2)
Data: aids (Number of observations: 13)
Draws: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 3000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 5654.70 7977.68 -789.24 17091.80 3.69 3 11
year -0.40 0.76 -1.48 0.30 4.11 3 11
IyearE2 -0.00 0.00 -0.00 0.00 3.84 3 11
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
( g2_bayes = brms::brm(
cases~I(year-min(year))+I((year-min(year))^2), data=aids, family=poisson,
refresh=0, backend = "cmdstanr"
) ) Family: poisson
Links: mu = log
Formula: cases ~ I(year - min(year)) + I((year - min(year))^2)
Data: aids (Number of observations: 13)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 2.43 0.15 2.14 2.71 1.00 1101 1370
IyearMminyear 0.51 0.04 0.44 0.60 1.00 1116 1364
IyearMminyearE2 -0.02 0.00 -0.03 -0.02 1.00 1183 1583
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
predicted_draws_fix(g2_bayes, newdata = aids) |>
group_by(.chain, .row) |>
summarize(
rmse = yardstick::rmse_vec(cases, .prediction),
crps = calc_crps(.prediction, cases)
) |>
group_by(.chain) |>
summarize(
rmse = mean(rmse),
crps = mean(crps)
)# A tibble: 4 × 3
.chain rmse crps
<int> <dbl> <dbl>
1 1 14.1 5.02
2 2 14.3 5.11
3 3 14.1 5.12
4 4 14.5 5.18
predicted_draws_fix(g2_bayes, newdata = aids) |>
group_by(.row, cases) |>
tidybayes::mean_hdi(
.prediction, .width = c(0.5,0.9,0.95)
) |>
mutate(contains = cases >= .lower & cases <= .upper) %>%
group_by(.width) |>
summarize(
emp_cov = sum(contains)/n()
)# A tibble: 3 × 2
.width emp_cov
<dbl> <dbl>
1 0.5 0.615
2 0.9 1
3 0.95 1
This is another case of a generalized linear model, specifically where the outcome is 0-1 data (i.e. Bernoulli draws),
\[ \begin{aligned} Y_i &\sim \text{Bern}(p_i)\\ \text{logit} \, E(Y_i|\boldsymbol{X}_{i\cdot}) &= \text{logit}(p_i) = \underset{1 \times p}{\boldsymbol{X}_{i\cdot}}\underset{p \times 1}{\boldsymbol{\beta}} \\ \\ E(Y_i) &= p_i \\ Var(Y_i) &= p_i(1-p_i) \\ \\ \text{logit}(p_i) &= \log \frac{p_i}{1-p_i} \\ \text{logit}^{-1}(x) &= \frac{\exp(x)}{1+\exp(x)} = \frac{1}{1+\exp(-x)} \end{aligned} \]
Next we’ll be looking at data on the presence and absence of the short-finned eel (Anguilla australis) at a number of sites in New Zealand.
These data come from
presence - presence (1) or absence (0) of Anguilla australis at the sampling location
SegSumT - Summer air temperature (degrees C)
DSDist - Distance to coast (km)
DSMaxSlope - Maximum downstream slope (degrees)
USRainDays - days per month with rain greater than 25 mm
USSlope - average slope in the upstream catchment (degrees)
USNative - area with indigenous forest (proportion)
DSDam - Presence of known downstream obstructions, mostly dams
Method - fishing method (electric, net, spot, trap, or mixture)
LocSed - weighted average of proportional cover of bed sediment
# A tibble: 824 × 10
presence SegSumT DSDist DSMaxSlope USRainDays USSlope USNative DSDam Method
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <fct>
1 0 16 50.2 0.57 2.47 9.8 0.81 0 electric
2 1 18.7 133. 1.15 1.15 8.3 0.34 0 electric
3 0 18.3 107. 0.57 0.847 0.4 0 0 spo
4 0 16.7 167. 1.72 0.21 0.4 0.22 1 electric
5 1 17.2 3.95 1.15 1.98 21.9 0.96 0 electric
6 0 15.1 11.2 1.72 3.3 25.7 1 0 electric
7 0 12.7 42.4 2.86 0.43 9.6 0.09 0 electric
8 1 18.2 94.4 3.43 0.847 20.5 0.92 0 electric
9 0 14.9 45.7 2.29 2.25 3.9 0.38 0 electric
10 1 18.3 91.9 1.72 0.861 6.7 0.58 1 electric
# ℹ 814 more rows
# ℹ 1 more variable: LocSed <dbl>
Call:
glm(formula = presence ~ SegSumT, family = binomial, data = anguilla_train)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -15.02201 1.53770 -9.769 <2e-16 ***
SegSumT 0.80047 0.08726 9.173 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 627.81 on 617 degrees of freedom
Residual deviance: 501.93 on 616 degrees of freedom
AIC: 505.93
Number of Fisher Scoring iterations: 5
# A tibble: 618 × 8
presence SegSumT .fitted .resid .hat .sigma .cooksd .std.resid
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 16.4 0.131 -0.529 0.00260 0.903 0.000197 -0.530
2 1 17.1 0.209 1.77 0.00232 0.901 0.00443 1.77
3 0 14 0.0216 -0.209 0.00231 0.903 0.0000256 -0.209
4 0 18.2 0.389 -0.992 0.00364 0.903 0.00117 -0.994
5 0 15.6 0.0735 -0.391 0.00286 0.903 0.000114 -0.391
6 0 18.3 0.408 -1.02 0.00395 0.902 0.00137 -1.03
7 0 18.5 0.447 -1.09 0.00466 0.902 0.00190 -1.09
8 0 16.2 0.114 -0.491 0.00270 0.903 0.000174 -0.492
9 0 18 0.351 -0.930 0.00313 0.903 0.000853 -0.932
10 1 17.3 0.236 1.70 0.00233 0.901 0.00379 1.70
# ℹ 608 more rows
\[ r_i = \frac{Y_i - E(Y_i)}{\sqrt{Var(Y_i)}} = \frac{Y_i - \hat{p}_i}{\sqrt{\hat{p}_i(1-\hat{p}_i)}} \]
\[ \begin{aligned} d_i = &\text{sign}(Y_i-\hat{p_i}) \, \times \\ &\sqrt{ -2 \left(Y_i \log \hat{p}_i+(1-Y_i)\log (1 - \hat{p}_i) \right) } \end{aligned} \]
Call: glm(formula = presence ~ SegSumT, family = binomial, data = anguilla_train)
Coefficients:
(Intercept) SegSumT
-15.0220 0.8005
Degrees of Freedom: 617 Total (i.e. Null); 616 Residual
Null Deviance: 627.8
Residual Deviance: 501.9 AIC: 505.9
Call:
glm(formula = presence ~ ., family = binomial, data = anguilla_train)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.352885 1.761202 -5.311 1.09e-07 ***
SegSumT 0.654186 0.096921 6.750 1.48e-11 ***
DSDist -0.004837 0.002302 -2.102 0.03559 *
DSMaxSlope -0.030776 0.061995 -0.496 0.61959
USRainDays -0.710920 0.225814 -3.148 0.00164 **
USSlope -0.069814 0.025443 -2.744 0.00607 **
USNative -0.456598 0.455261 -1.003 0.31589
DSDam -1.095360 0.516960 -2.119 0.03410 *
Methodmixture -0.430351 0.475411 -0.905 0.36535
Methodnet -0.066214 0.559162 -0.118 0.90574
Methodspo -1.583905 0.701902 -2.257 0.02403 *
Methodtrap -2.958398 0.688146 -4.299 1.72e-05 ***
LocSed -0.140495 0.096849 -1.451 0.14688
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 627.81 on 617 degrees of freedom
Residual deviance: 426.96 on 605 degrees of freedom
AIC: 452.96
Number of Fisher Scoring iterations: 6
Sta 344/644 - Fall 2023